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Abstract — Automatic  land  cover  classification  maps  were  devel¬ 
oped  from  Airborne  Hyperspectral  Scanner  (HyMAP)  imagery  ac¬ 
quired  May  8,  2000  over  Smith  Island,  VA,  a  barrier  island  in  the 
Virginia  Coast  Reserve.  Both  unsupervised  and  supervised  clas¬ 
sification  approaches  were  used  to  create  these  products  to  eval¬ 
uate  relative  merits  and  to  develop  models  that  would  be  useful  to 
natural  resource  managers  at  higher  spatial  resolution  than  has 
been  available  previously.  Ground  surveys  made  by  us  in  late  Oc¬ 
tober  and  early  December  2000  and  again  in  May,  August,  and 
October  2001  and  May  2002  provided  ground  truth  data  for  20 
land  cover  types.  Locations  of  pure  land  cover  types  recorded  with 
global  positioning  system  (GPS)  data  from  these  surveys  were  used 
to  extract  spectral  end-members  for  training  and  testing  super¬ 
vised  land  cover  classification  models.  Unsupervised  exploratory 
models  were  also  developed  using  spatial-spectral  windows  and 
projection  pursuit  (PP),  a  class  of  algorithms  suitable  for  extracting 
multimodal  views  of  the  data.  PP  projections  were  clustered  by 
ISODATA  to  produce  an  unsupervised  classification.  Supervised 
models,  which  relied  on  the  GPS  data,  used  only  spectral  inputs 
because  for  some  categories  in  particular  areas,  labeled  data  con¬ 
sisted  of  isolated  single-pixel  waypoints.  Both  approaches  to  the 
classification  problem  produced  consistent  results  for  some  cat¬ 
egories  such  as  Spartina  alterniflora,  although  there  were  differ¬ 
ences  for  other  categories.  Initial  models  for  supervised  classifica¬ 
tion  based  on  112  HyMAP  spectra,  labeled  in  ground  surveys,  ob¬ 
tained  reasonably  consistent  results  for  many  of  the  dominant  cat¬ 
egories,  with  a  few  exceptions.  For  an  invasive  plant  species,  Plirag- 
mites  australis,  a  particular  concern  of  natural  resource  managers, 
this  approach  initially  had  an  excessively  high  false-alarm  rate. 
Increasing  the  number  of  spectral  training  samples  by  an  order 
of  magnitude  and  making  concomitant  improvements  to  the  geo¬ 
rectification  led  to  dramatic  improvements  in  this  and  other  cat¬ 
egories.  The  unsupervised  spatial-spectral  approach  also  found  a 
cluster  closely  associated  with  Phragmites  patches  near  the  thicket 
boundary,  but  this  approach  did  not  identify  the  exposed  Phrag¬ 
mites.  Examples  of  in  situ  reflectance  measurements  obtained  with 
an  Analytical  Spectral  Devices  FR  spectrometer  in  early  May  2001 
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are  compared  against  HyMAP  image  spectra  at  model-predicted 
pixels  and  at  validated  GPS  waypoints. 

Index  Terms — Barrier  islands,  hyperspectral,  in  situ  spectrom¬ 
etry,  invasive  plant  species,  land  cover  classification,  neural  net¬ 
works,  principle  component  analysis,  projection  pursuit,  super¬ 
vised  classification,  unsupervised  classification. 

I.  Introduction:  The  Virginia  Coast  Reserve 

A  HYMAP  [1],  [31]  scene  of  Smith  Island,  VA,  acquired 
on  May  8,  2000,  served  as  the  basis  of  the  present  study 
(Fig.  1).  Smith  Island  is  one  of  a  series  of  barrier  islands  in  the 
Virginia  Coast  Reserve  (VCR)  and  the  site  of  the  University 
of  Virginia’s  ongoing  Long  Term  Ecological  Research  (LTER) 
program  [32],  [38],  The  most  extensive  survey  of  the  island 
dates  from  1974  [32],  [35]  and  was  based  on  ground  obser¬ 
vations  and  interpretation  of  false-color  infrared  imagery  for 
a  set  of  16  barrier  islands  that  encompass  the  VCR.  This  his¬ 
torical  reference  data  consisted  of  26  land  cover  types.  To  de¬ 
velop  our  automatic  land  cover  classification  models,  we  chose 
a  somewhat  different  approach,  attempting  to  achieve  species- 
level  classification  in  many  instances,  while  considering  in  some 
cases  plant  communities  that  were  similar  to  those  described  in 
[35],  Our  land  cover  classification  models  consisted  of  16  to 
19  categories.  However,  for  purposes  of  this  introduction,  we 
have  grouped  the  land  cover  into  five  or  six  principal  categories, 
some  of  which  equate  to  those  described  in  [35],  while  others 
are  aggregates  of  several  of  these  categories.  New  definitions  for 
coastal  vegetation  are  presently  under  development  by  the  state 
of  Virginia  [18]. 

Particularly  in  wetlands  research  and  coastal  applications, 
past  emphasis  has  been  on  either  1)  broad-band  sensors  such 
as  Landsat  TM  [16],  [28]  or  2)  hyperspectral  sensors  at  lower 
spatial  resolution  [22].  For  modeling  regional  scales  with  the 
former,  the  National  Oceanic  and  Atmospheric  Administra¬ 
tion’s  C-CAP  protocol  has  been  widely  used.  A  cornerstone 
of  this  has  been  the  cluster-busting  algorithm  developed  by 
Jensen  [27],  which  is  a  labor-intensive,  though  highly  accurate, 
approach.  Likewise,  in  coastal  applications,  the  dominant 
approach  for  hyperspectral  modeling  has  been  spectral  linear 
mixing  models  (e.g.,  see  [22])  applied  to  AVIRIS  imagery  at 
resolutions  «20  m  pixels.  Other  analyses  have  considered  the 
use  of  vegetation  indices  for  extracting  biophysical  param¬ 
eters,  comparing  both  Landsat  TM  and  AVIRIS  with  in  situ 
spectrometry  measurements  [48],  One  of  our  goals  in  using  a 
hyperspectral  sensor  with  a  higher  spatial  resolution  of  4.5  m 
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Landsat  TM  RGB  Composite  of 
Northampton  County,  Virginia, 
Including  the  Virginia  Coast  Reserve 


HYMAP  RGB  composite 
Smith  Island,  Virginia 


Fig.  1.  (Left)  RGB  composite  of  the  red,  green,  and  blue  channels  from  a  Landsat  Thematic  Mapper  (TM)  image  taken  in  April,  1998  of  Northampton  County, 
VA,  showing  a  subset  of  the  islands  known  as  the  Virginia  Coast  Reserve.  Smith  Island  is  highlighted  in  the  box.  (Right)  RGB  composite  from  126-channel  HyMAP 
imagery  of  Smith  Island,  VA  and  the  southern  portion  of  Myrtle  Island,  acquired  May  8,  2000.  A  portion  of  Myrtle  Island  has  been  omitted. 


was  to  be  able  to  discriminate  rapidly  varying  land  cover  types 
seen,  for  example,  in  the  transition  zone  from  the  lagoonal 
shore  to  the  upland.  On  Smith  Island,  six  to  seven  distinct 
vegetation  zones  may  occur  in  a  distance  as  short  as  50-75  m. 
Although  we  do  not  explore  mixture  models  in  the  present 
study,  they  will  be  compared  with  the  methods  presented  here 
in  a  future  publication. 

The  spatial  distribution  of  land  cover  types  included  in  our 
models  varied  considerably.  Categories  such  as  Myrica  cerifera 
(bayberry)  thicket  occur  only  in  the  southern  end  of  Smith  Is¬ 
land,  striating  the  island  in  dense  bands  of  vegetation.  These 
thickets  are  typically  tens  of  meters  in  width  and  can  extend 
in  some  instances  nearly  the  width  of  the  island  (about  2  km). 
Categories  such  as  these,  therefore,  whose  spatial  extent  is  fre¬ 
quently  greater  than  the  resolution  cell  of  the  sensor  (the  H-res- 
olution  case  described  in  [42])  are  amenable  to  modeling  that 
uses  supervised  classification,  at  least  in  the  final  stages  of  pro¬ 
cessing.  In  contrast,  in  other  areas  some  vegetation  categories 
have  a  spatial  extent  that  is  of  the  order  of  a  pixel  or  less  (the 
L-resolution  case  [42]).  In  some  cases,  the  spatial  extent  in  one 
dimension  may  be  of  the  order  of  a  pixel  or  less  in  one  dimen¬ 
sion,  while  having  a  length  of  several  pixels  or  more  in  the  other 
dimension.  The  latter  occurs  in  some  instances  for  the  invasive 
plant  species  Phragmites  australis  in  the  southern  end  of  Smith 
Island  (not  all  stands  are  so  narrow;  the  width  varies  consider¬ 
ably).  In  this  part  of  the  island,  Phragmites  australis ,  where  it 
occurs,  typically  forms  a  narrow  band  of  vegetation  in  the  eco- 
tone  between  the  upland  thicket  and  brackish  and  fresh  water 


marshes  in  the  swale  immediately  adjacent.  In  most  cases  for 
our  validation  testing,  the  width  of  the  stands  that  we  consid¬ 
ered  was  at  least  a  couple  of  pixels,  although  it  is  not  uncommon 
to  find  stands  whose  width  (extent  perpendicular  to  the  thicket 
line)  is  on  the  order  of  a  pixel.  In  a  sense,  this  is  the  ideal  can¬ 
didate  for  L-resolution  methods,  which  assume  that  in  at  least 
one  dimension  the  category  extent  may  be  a  pixel  or  less,  but 
the  spectral  pattern  associated  with  the  category  is  repeated  in 
some  regular  spatial  distribution  that  can  be  detected.  This  is 
our  motivation  for  also  considering  spatial-spectral  models  that 
use  unsupervised  feature  extraction  and  classification  based  on 
projection  pursuit  and  pincipal  component  analysis. 

As  just  mentioned,  some  vegetation  communities  have  spatial 
extents  that  may  be  only  a  few  pixels  at  the  HyMAP  spatial  res¬ 
olution  of  4.5  m,  so  this  resolution  forms  an  upper  bound  on  the 
ideal  spatial  resolution.  The  utility  of  land  cover  classification 
models  is,  of  course,  determined  by  the  end-user  [36],  [37],  Be¬ 
yond  the  narrow  goal  of  achieving  ecological  modeling  at  high 
resolution,  there  are  practical  reasons  for  why  using  this  kind 
of  data  will  benefit  natural  resource  managers.  For  example,  as 
just  described,  the  invasive  plant  species  Phragmites  australis 
may  exist  in  patches  whose  spatial  extent  may  be  on  the  order 
of  the  pixel  size  of  HyMAP  in  at  least  one  dimension.  Likewise, 
because  it  has  spectral  characteristics  that  are  similar  to  other 
wetland  plants,  it  is  unlikely  that  systems  with  a  few  broad  spec¬ 
tral  channels  would  be  able  to  discriminate  it,  especially  when 
it  occurs  in  close  proximity,  as  it  often  does  on  Smith  Island,  to 
other  vegetation  types  such  as  the  Myrica  cerifera  thicket.  Al- 
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though  there  is  some  debate  as  to  how  problematic  Phragmites 
is  [39],  many  natural  resource  managers  agree  that  it  supplants 
other  wetland  types,  disrupting  ecosystem  balance,  and  Phrag¬ 
mites  control  and  eradication  programs  are  not  uncommon. 

Even  within  a  single  category,  variations  in  spatial  extent 
occur.  For  instance,  one  of  the  primary  constituents  of  the  low 
marsh  vegetation  (Fig.  2),  Spartina  alterniflora  (Smooth  Cord- 
grass),  occurs  in  large  monotypic  stands  in  the  northern  end  of 
Smith  Island,  while  at  the  southern  end  of  the  island,  it  occurs 
in  one  or  two  narrow  bands  of  vegetation  at  the  water’s  edge  on 
the  lagoonal  (western)  shore,  and  in  small  zones  in  the  brackish 
swales  that  cross  the  island. 

High  marsh  species  (Fig.  2)  include  Salicornia  virginica  (Per- 
renial  Glasswort),  Limonium  carolinianum  (sea  lavender),  Bor- 
richia  frutescens  (Sea  Ox-eye),  Iva  frutescens  (Marsh-elder), 
Sueada  linearis  and  Sueada  maritima  (Sea-blite),  and  Spartina 
patens  (Salt-Hay  or  Saltmeadow  Cordgrass).1  The  upper  end 
of  the  high  marsh  frequently  has  a  zone  of  “wrack,”  the  dead, 
matted  detritus  of  the  previous  year’s  growth,  which  typically 
marks  the  mean  high-water  line  associated  with  tidal  influences. 
The  swales  (Fig.  2)  that  cross  the  southern  end  of  Smith  Island 
contain  brackish  and  fresh-water  marshes.  Swale  vegetation  in¬ 
cludes  Distichlis  spicata  (Saltgrass),  Spartina  patens ,  Juncus 
roemerianus  (Needle  Rush),  Scirpus  robustus  (Saltmarsh  Bul¬ 
rush),  and  Iva  frutescens. 

Narrow  upland  zones  (Fig.  2)  alternate  with  swales  across 
the  southern  end  of  the  island.  Here  the  typical  vegetation  con¬ 
sists  of  shrubs  such  as  Myrica  cerifera  (Bayberry  ),  the  dominant 
vegetation,  and  Baccharis  halimifolia  (Groundsel-tree),  with  at¬ 
tendant  vegetation  such  as  Smilax  spp.  (Greenbriar).  Stands  of 
hardwoods  and  Pine,  such  as  Pinus  taeada  (Foblolly  pine),  also 
occur  in  some  of  the  upland  zones.  In  these  areas,  it  is  common 
to  find  shrubs  such  as  Myrica  cerifera  in  the  understory. 

Flats  (Fig.  2)  appear  throughout  the  island.  These  consist  of 
mudflats,  wash  flats,  and  salt  flats  or  salt  pannes.  Wash  flats 
result,  for  example,  from  sudden  storm  surge  events  in  which 
the  dune  line  is  breached.  Salt  pannes  occur  in  places  where 
water  floods  an  area  and  evaporates,  leaving  behind  a  signifi¬ 
cant  amount  of  salt.  The  high  salinity  tends  to  kill  off  most  veg¬ 
etation,  and  typically  only  the  most  salt-tolerant  plants  such  as 
Salicornia  virginica  will  survive  in  small  clumps;  wash  flats  are 
often  predecessors  of  salt  pannes  [35]. 

The  beach  zone  (Fig.  2)  is  highly  variable.  In  the  northern  end 
of  Smith  Island,  exposed  peat  outcrops  are  present  in  the  surf 
zone.  These  are  the  decomposed  residue  of  what  was  once  salt 
marsh,  and  they  serve  as  a  reminder  that  the  island  is  undergoing 
constant  change.  In  the  foredune  zone  (also  Fig.  2),  “wrack” 
is  frequently  found,  and  in  summer,  a  low  band  of  herbaceous 
vegetation,  comprised  principally  of  plants  such  as  Cakile  eden- 
tula  (Sea  Rocket)  and  Salsola  kali  (Russian  Thistle).  The  dune 
line  (Fig.  2)  typically  is  comprised  of  plant  species  such  as  Am- 
mophila  breviligulata  (American  beachgrass),  Uniola  panicu- 
lata  (Sea  oats),  Salidago  sempervirens  (Salt  Marsh  Goldenrod), 
and  in  some  cases  Panicum  amarum  (Seaside  Panicum).  The 
back  dune  is  dominated  by  vegetation  such  as  Spartina  patens, 

'Common  names  of  coastal  vegetation  may  vary  somewhat  from  author  to 
author  as  do  definitions  of  species  names  listed  in  italics;  in  this  paper,  we  have 
used  [15]  and  [43], 


Ammophila  breviligulata,  and  Andropogon  spp.  (Broomsedge 
family). 

II.  HyMAP  Data  for  Smith  Island,  VA 

The  HyMAP  imagery  was  atmospherically  corrected  using 
ATREM/EFFORT  [13]  by  Analytical  Imaging  and  Geophysics 
EEC  (AIG)  prior  to  delivery.  The  Smith  Island  scene  was 
acquired  at  4.5-m  resolution  with  128  spectral  channels;  the 
final  EFFORT  product  was  the  surface  reflectance  contained  in 
126  spectral  channels  ranging  from  445-2486  nm.  The  image 
was  acquired  near  high  tide,  so  there  is  a  significant  degree  of 
inundation  in  the  wetlands,  especially  in  the  salt  marsh.  For  the 
purposes  of  automated  model  development,  we  preprocessed  the 
data  on  a  per-sample  basis  in  a  number  of  different  ways  (Fig.  3). 
Since  the  data  points  labeled  in  the  global  positioning  system 
(GPS)  surveys  consisted  of  a  mix  of  both  isolated  points  and 
areas,  the  supervised  automatic  classification  models  used  only 
the  single  pixel  spectrum  as  input,  while  the  unsupervised  models 
did  not  need  to  satisfy  this  constraint  and,  therefore,  could  ingest 
both  single -pixel  spectra  and  spatial-spectral  windows. 

III.  Field  Observations;  Geolocated  Spectra 
and  In  Situ  Spectrometry 

We  compared  the  unsupervised  and  supervised  automatic 
classification  category  maps  against  in  situ  observations 
made  during  two  days  of  field  observations  and  GPS  surveys 
conducted  in  October  and  December  2000,  a  week  of  surveys 
carried  out  with  GPS  between  May  7-11,  2001,  two  weeks 
of  differential  GPS  (DGPS)  surveys  conducted  during  August 
20-23  and  October  8-12  2001,  and  again  between  May  3-5 
and  May  13-15,  2002.  During  these  trips,  typical  vegetation 
categories  were  identified,  and  positions  were  recorded  using 
a  GPS  or  DGPS.  These  same  waypoints  were  also  used  to 
generate  supervised  classification  maps.  During  the  May  7-11, 
2001  and  May  13-15,  2002  field  trips,  we  also  measured  in 
situ  reflectance  with  an  Analytical  Spectral  Devices  (ASD)  FR 
spectrometer,  which  covers  a  spectral  range  similar  to  that  of 
HyMAP.  Our  DGPS  survey  equipment  consisted  of  a  Trimble 
Geoexplorer  3  and  Beacon-on-a-Belt.  During  these  weeks, 
we  also  surveyed  four  other  islands  to  the  extent  that  time 
permitted,  taking  data  on  Hog,  Cobb,  Wreck,  and  Myrtle,  in 
addition  to  Smith.  Equipment  problems  prevented  additional 
spectral  measurements;  however,  two  weeks  prior  to  the 
August  2001  survey  at  the  VCR,  we  were  able  to  acquire  ASD 
measurements  at  another  site  in  southern  New  Jersey.  During 
the  August  field  trip  and  one  week  after  the  October  field  trip, 
airborne  hyperspectral  data  were  acquired  by  PROBE2  [17] 
for  all  six  of  the  VCR  islands  in  our  study  area  for  comparison 
against  our  May  2000  HyMAP  data.  These  PROBE2  data 
will  be  the  subject  of  future  papers.  This  is  motivated  by  our 
desire  to  understand  the  effect  that  seasonal  changes  in  the 
land  cover  have  on  spectral  characteristics.  As  described  below, 
models  that  were  produced  for  the  spring  HyMAP  data  may 
not  necessarily  apply  to  data  taken  in  the  summer  or  fall. 
Fikewise,  tidal  influences  can  have  a  significant  impact  on 
marsh  vegetation  and  their  associated  spectra  because  of  the 
degree  of  inundation  present. 
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Upland 


Mudflat  Peat  °utcroP 


Fore-dune,  Dune,  and  Back-Dune 


Low  Marsh 


High  Marsh 


Swale,  Fresh  Water  and  Brackish  Marshes 


Fig.  2.  Typical  Smith  Island  land  cover.  (First  row,  left)  Upland  zone:  Myrica  cerifera  thickets  and  some  stands  of  hardwood  and  Pinus  taeda  (loblolly  pine). 
(First  row,  middle)  Typical  mudflat  near  salt  marsh  edge.  (First  row,  right)  Peat  outcrop  in  surf  zone.  (Second  row,  left)  Foredune  vegetation:  primarily  Cakile 
edentula  (Sea  Rocket)  (inset)  and  Salsola  kali  (Russian  Thistle).  (Second  row,  middle)  Dune  vegetation  and  nearby  backdune:  primarily  Ammophila  breviligulata 
(American  Beachgrass)  (foreground),  and  ocassionally  Uniola  paniculata  (Sea  oats)  (background).  (Second  row,  right)  Inland  portion  of  backdune:  predominantly 
Andropogon  spp.  (Broomsedge  family).  (Third  row,  left)  Spartina  alterniflora  (Smooth  Cordgrass),  dominant  in  low  marsh;  (third  row,  right)  Borrichia  frutescens, 
typical  high  marsh  plant;  (third  row,  middle)  “wrack.”  Brackish  marsh  dwellers:  (fourth  row,  left)  Juncus  roemerianus  (Needle  Rush),  (fourth  row,  middle  and 
inset)  Scirpus  robustus  (Saltmarsh  Bulrush),  and  (fourth  row,  third  column)  Phragmites  australis,  an  invasive  plant  species;  (fourth  row,  fourth  column)  Distichlis 
spicata  a  dominant  swale  grass. 
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Fig.  3.  Processing  configurations  for  automated  land  cover  classification 
models.  (Top)  Supervised  models  used  georeferenced  HyMAP  spectra  labeled 
during  GPS  and  DGPS  ground  surveys.  The  supervised  classifier  was  BPCE. 
Some  models  used  PC  A  or  PP  for  feature  extraction/dimensionality  reduction 
as  a  precursor  to  BPCE.  (Bottom)  Single-pixel  and  spatial-spectral  windows 
were  derived  from  a  subset  of  the  HyMAP  data  for  the  southern  end  of  Smith 
Island.  PP-filtered  data  were  passed  to  ISODATA.  For  both  unsupervised  and 
supervised  approaches,  models  were  produced  for  the  entire  island. 

We  recorded  the  environment  at  many  of  the  waypoints  using 
digital  still  photographs  and  video.  Based  on  these  field  observa¬ 
tions,  we  initially  defined  a  set  of  16  categories,  some  of  which 
appear  or  were  aggregates  of  categories  in  Table  I.  We  ended  up 
using  all  but  the  foredune  category  in  the  final  set  as  the  basis 
of  our  supervised  classification  models  (primarily  because  the 
foredune  vegetation  in  early  May  will  typically  be  nascent  and 
sparse  or  completely  absent).  After  the  May  and  August  2001 
surveys,  two  additional  categories  were  added  (Table  I):  Peat 
outcrop  and  Scirpus  robustus ,  and  we  split  two  aggregate  cat¬ 
egories  into  their  primary  constituent  plant  species:  backdune 
became  Andropogon  spp.  and  Ammophila  breviligulata,  and  the 
thicket  vegetation  was  separated  into  Pine/Hardwood  complex 
and  Myrica  cerifera- dominated  thicket.  We  created  spectral  li¬ 
braries  from  individual  HyMAP  spectra  extracted  at  the  associ¬ 
ated  waypoint,  or  where  appropriate,  small  regions  of  interest 
(ROIs)  bounded  by  GPS  waypoints.  After  the  DGPS  data  were 
collected,  points,  lines,  and  areas  were  available  with  an  accu¬ 
racy  estimated  to  be  <1-5  m,  similar  to  the  spatial  resolution 
of  the  May  HyMAP  data.  These  were  used  to  train  and  test  su¬ 
pervised  automatic  classification  models  more  rigorously  as  de¬ 
scribed  below.  The  DGPS  ground  data  also  were  used  to  im¬ 
prove  georectification  of  the  imagery. 

IV.  Methods 

Both  supervised  and  unsupervised  classification  models  of 
the  land  cover  were  produced.  In  this  section,  we  outline  how 
the  models  were  produced. 

A.  Unsupervised  Feature  Extraction  and  Classification 

Unsupervised  feature  extraction  algorithms  were  used  for  two 
purposes  in  this  study.  In  both  cases,  these  fulfilled  the  role 


TABLE  I 


(1)  Phragmites  australis 

(2)  Spartina  alterniflora 

(3)  Spartina  patens 

(4)  Salicomia  virginica 

(5)  Borrichia  frutescens 

(6)  Juncus  roemerianus 

(7)  Water 

(8)  Distichlis  spicata 

(9)  Scirpus  spp. 

(10)  “Wrack” 

(11)  Mudflat/saltflat 

(12)  Ammophila  breviligulata 

(13)  Beach/sand 

(14)  Uniola  paniculata 

(15)  Andropogon  spp. 

(16)  Myrica  cerifera- dominated  Thicket 

(17)  Pine/hardwood  complex 

(18)  Peat  Outcrop 

(19)  Iva  frutescens 

(20)  Foredune  Vegetation 

of  dimensionality  reduction  as  a  precursor  stage  prior  to  the 
final  classification  algorithm,  either  unsupervised  or  supervised 
(Fig.  3).  The  two  unsupervised  feature  extraction  algorithms  that 
we  used  were  the  projection  pursuit  (PP)  algorithm  described  in 
[8]  and  the  well-known  principal  component  analysis  (PCA)  al¬ 
gorithm  [47]  that  is  popular  in  the  remote  sensing  literature  (e.g., 
see  [23]  and  [44],  and  many  others). 

The  underlying  philosophies  of  PP  and  PCA  are  quite  dif¬ 
ferent.  PCA  uses  the  directions  of  maximal  variance  and  derives 
an  orthonormal  set  of  basis  vectors  to  identify  significant  struc¬ 
ture  in  the  data;  these  views  of  the  data  are  not  always  easily 
interpreted  with  respect  to  specific  underlying  categories  be¬ 
cause  of  the  orthogonality  requirement  [7],  [8],  [14].  Because 
PCA  looks  for  directions  of  maximal  variation  in  the  data,  it  is 
incapable  of  detecting  multimodal  and  other  non-Gaussian  de¬ 
partures  that  do  not  happen  to  be  parallel,  or  nearly  parallel,  to 
the  principal  axes  of  the  projected  data  distribution.  In  contrast, 
PP  [7] — [10],  [12],  [20],  [21],  [46]  uses  higher  order  statistical 
information  to  overcome  this  difficulty  and  identify  directions 
in  which  the  projected  data  distribution  (view)  is  non-Gaussian 
or  multimodal. 

Only  within  the  last  ten  years  has  PP  been  applied  in  the  field 
of  remote  sensing  (see  [2]— [5],  [7],  [8],  [25],  [29],  and  [30])  and 
in  other  disciplines  (see  [19],  [26],  [33],  and  [34]).  The  PP  al¬ 
gorithm  described  in  [8]  (the  PP  algorithm  used  in  this  paper) 
is  based  on  an  algorithm  originally  proposed  in  [20].  However, 
in  [8],  projections  are  optimized  simultaneously  rather  than  in 
residual  subspaces,  as  is  sometimes  the  case  in  PP  algorithms 
[21],  [24],  and  projections  are  nonlinear,  in  order  to  remove  sen¬ 
sitivity  to  outliers,  rather  than  the  linear  form  found  in  [20].  Al¬ 
though  further  details  are  provided  in  [8],  the  basic  idea  is  that 
a  cost  function,  emphasizing  both  intracluster  spread  and  com¬ 
pactness  within  each  cluster,  is  to  be  optimized.  This  function 
of  the  projected  data  distribution  is  the  product  of  two  func¬ 
tions,  one  measuring  compactness  of  the  data  projection  within 
a  particular  search  scale  and  another  measuring  the  spread  of 
the  data  in  that  projection.  The  user  defines  a  range  of  search 
scales,  cifc,  that  correspond  to  fractions  of  the  standard  deviation 
of  projected  data  distributions  onto  initially  selected  random  di¬ 
rections  (the  projection  vectors)  in  pattern  input  space;  is 
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Fig.  4.  (Top,  left)  Sixteen-category  land  cover  supervised  BPCE  classification  based  on  112  georeferenced  HyMAP  spectral  end-members  labeled  from  GPS 
field  surveys  in  October  and  December,  2000.  Prediction  of  salt  marsh  vegetation  in  the  north  end,  and  many  of  the  marsh  and  swale  categories  to  the  south  appear 
consistent.  Gaps  in  the  center  of  salt  marsh  zones  to  the  north  are  areas  of  heavy  inundation  which  were  declared  as  water  by  the  model.  Biggest  errors  occurred 
for  Phragmites  australis ,  Juncus  roemerianus,  and  Uniola  paniculata,  all  of  which  had  high  false-alarm  rates.  (Top,  right)  Nineteen-category  supervised  land 
cover  classification  based  on  3656  HyMAP  spectral  end-members,  labeled  from  DGPS  and  GPS  surveys  for  training  the  model,  showing  dramatic  reduction  in 
false-alarm  rate  for  these  categories.  (Bottom,  left)  RGB  composite  of  three  PP  projections  of  HyMAP  spatial-spectral  windows;  (bottom,  right)  34-category  land 
cover  map  produced  by  ISODATA  clustering  in  a  five-dimensional  PP  projection  space,  including  the  three  PP  projections  shown  in  the  RGB  composite. 


chosen  at  random  within  the  user-specified  range,  and  one  ak  is 
associated  with  each  data  projection. 

The  Friedman-Tukey  Projection  Index  [20],  I,  on  which  our 
projection  index  is  based,  was  the  product  of  a  trimmed  variance 
S  and  a  compactness  function  N 


Maximize:  I(ck)  =  S(ck)N(ck) 


S(ct)  = 


N 


Mp—m 

E  Mm)  -  E(ck)y 

/‘=i _ 

Mp  —  m 


Mp  Mp 

N{ck )  =  Y.Y.  g(rk(v ,  v)) 

fl=l  V=1 

x  8{R  -  rk(ii,v)) 


0) 


(2) 


(3) 


with  v)  =  |  ck(n)  ~  ck{v)  \  (4) 

ck{n)  =wk  ■  f(n)  (5) 

where  ck([i)  is  the  kth  data  projection  of  the  /zth  sample  vector, 
denoted  /(/x),  and  unit  projection  vector  u>k;  6  is  a  step  func¬ 
tion;  R  is  a  scalar  compactness  or  cluster  scale;  g(rk(/j,,  v))  is  a 
monotonically  decreasing  function  of  the  distance  between  pro¬ 
jected  sample  pairs  rk(g,  v)\  Mp  is  the  number  of  samples;  and 
m  the  number  of  outliers  removed  in  the  trimmed  variance.  We 
replaced  their  projection  index  with  I* 

Maximize:  7*(ffe(/x,  u))  =  7?(ffe(/x,  v))D(rk{n,  vj)  (6) 
v(rk(v,  v))  =  EpaipS^v)g{fk{fi,  u))  (7) 

R(  f  k  (/F  E)  )  — Epairs^fijif) 

■  [(1  -  9(rk{n,v)))\ 


(8) 
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Fig.  5.  Extract  from  four  land  cover  classifications,  showing  results  for  the  southern  end  of  Smith  Island.  (Top,  left)  BPCE  model  based  on  original  112  HyMAP 
spectral  end-members,  shows  high  false-alarm  rate  for  Phragmites  australis,  Juncus  roemerianus,  and  Uniola  paniculata.  (Bottom,  left)  BPCE  classification  using 
3656  spectral  end-members  with  improved  georeferencing,  showing  dramatic  reduction  in  false  alarms;  model  also  discriminates  some  dune  vegetation  types,  and 
separates  Myrica  cerifera  from  Pine/Hardwood  complex.  (Top,  right)  PP-BPCE  composite,  using  expanded  spectral  set,  and  (bottom,  right)  PCA-BPCE  composite 
trained  on  same.  PCA-BPCE  shows  higher  rate  of  false  alarms  for  Phragmites  australis  than  PP-BPCE. 


with  rl(n,  v)  =  (cfe(M)  -  ck{v))2 

g(rk(n,v)) 

/  \ 

(9) 

g(m)  =cr 

(10) 

where  a(x)  —  atanh(aA:r) 

(a,  A,  constants) 

(ID 

Cj(m)  =wj-f(v)  +  bj 

(12) 

where  g  is  a  continuous  compactness  function,  of  a  nonlinear 
projection,  Ci(/t),  D  measures  spread  by  sampling  pairs  of 
projections  and  approaches  asymptotically  a  constant  weight 
outside  scale  ak\  and  Ep airs,0,z/)  signifies  expected  value 
over  projected  sample  pairs.  Other  differences  included  1) 
optimization  of  multiple  projections  at  the  same  time,  rather 
than  serially,  and  the  use  of  a  coupling  matrix  L.;  j  that  is 
adjusted  via  gradient  ascent  to  maximize  the  relative  entropy  of 
the  data  projections,  and  2)  our  use  of  a  saturating  nonlinearity 
a  to  remove  sensitivity  to  outliers,  meaning  that  all  data  points 


can  be  included.  Each  compactness  function  g{fk{li,vj)  has 
a  clustering  search  scale  ak  associated  with  it.  Each  a *.  is 
obtained  by  multiplying  an  estimate  of  the  initial  standard 
deviation  of  the  projected  data,  with  a  random  fraction  drawn 
from  a  user-determined  search  range.  We  optimized  Wj  by 
stochastic  gradient  ascent  in  I*. 

While  the  approach  that  we  defined  in  [8]  does  not  specifi¬ 
cally  aim  to  derive  an  orthonormal  PP  filter  set,  it  did  incorpo¬ 
rate  a  mechanism  for  decorrelating  projections  in  the  stochastic 
optimization  process.  Essentially,  a  coupling  matrix,  labeled  Lij 
above,  is  defined  between  the  projections,  and  this  matrix  is  si¬ 
multaneously  optimized  along  with  the  projections  in  such  a 
way  that  the  relative  entropy  between  the  projections  is  maxi¬ 
mized  (decorrelation).  The  degree  of  decorrelation  can  be  con¬ 
trolled  by  altering  the  size  of  the  initial  coupling  and  the  relative 
rates  of  optimization  of  the  relative  entropy  cost  function  used 
for  the  coupling  and  the  PP  cost  function  used  for  the  projec¬ 
tions.  Additional  implementation  details  can  be  found  in  [6]  and 
[8]. 

To  optimize  the  unsupervised  PP  and  PCA  filters,  we  used 
either  the  end-members  associated  with  our  GPS  and  DGPS 
ground  data  surveys  or,  in  some  instances,  larger  spectral  sub- 
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Myrica  cerifera  (Baybeiry)Thicket  &  Pme/Hardwood  Predicted  Distributions 


PP- 1  SOD  AT  A  Hybrid 


Fig.  6.  Models  associated  with  upland  thickets  and  tree  stands  for  (top  row,  left)  PP-ISODATA  and  (top  row,  right)  the  first  BPCE  model  based  on  1 12  spectral 
samples.  Distributions  for  the  two  are  largely  consistent,  with  the  exception  that  the  unsupervised  approach  has  included  an  area  of  glint  in  the  surf  zone  of  the 
eastern  shore.  (Middle  row,  left)  Spectral  reflectance  plots  for  HyMAP  data  at  GPS  waypoints  associated  with  Myrica  cerifera;  (middle  row,  right)  mean  and 
standard  deviation  of  PP-ISODATA  category  (includes  glint  zone);  (bottom  row,  left)  mean  and  standard  deviation  of  upland  thicket  and  tree  stands  distribution 
predicted  by  the  BPCE  model;  (bottom  row,  right)  ASD  reflectance  measurement  of  Myrica  cerifera  leaves  taken  on  May  11,  2001.  Note  that  the  relative  height 
of  the  first  peak  in  the  NIR  is  somewhat  higher  in  the  ASD  measurment  (gaps  are  removed  atmospheric  absorption  windows,  where  spectrometer  counts  created 
numerical  instabilities  in  the  reflectance  calculation),  and  overall  reflectance  is  slightly  higher  in  the  ASD  measurement. 


sets  derived  from  the  southern  end  of  the  island  that  were  rep¬ 
resentative  of  the  typical  spectral  variation  seen  in  the  data. 
PP  and  PC  A  filters  were  derived  from  either  1  x  1  X  126  or 
3  X  3  X  126  spatial-spectral  windows.  For  the  supervised  clas¬ 
sification  models,  described  in  Section  IV-B,  we  always  used 
the  spectral  end-members  associated  with  our  GPS  and  DGPS 
surveys.  In  the  latter  case,  because  the  size  and  shape  of  these 
ROIs  were  quite  variable,  we  restricted  ourselves  to  inputs  that 
were  1  x  1  X  126  (single-pixel  spectra). 

The  feature  extraction  stage  of  the  unsupervised  classification 
models  considered  in  our  experiments  used  either  1)  projection 
pursuit  or  2)  pincipal  component  analysis.  The  final  stage  of  the 
process  was  the  ISODATA  [45]  algorithm. 


B.  Supervised  Classification  Models 

In  all  supervised  Gasification  models  considered  in  this  paper, 
the  final  stage  of  classification  was  a  variant  of  the  backward 
propagation  of  error  model  [41]  with  a  cross-entropy  cost  func¬ 
tion  (BPCE)  [40].  The  BPCE  cost  function  is 

E(x)  =  -  ^((1  -  di(x))  ln(l  -  Ci(x))  +  di{x )  ln(c;(£))) 

(13) 

where  di  is  the  desired  output,  either  0  or  1,  for  one  of  the  cate¬ 
gory  nodes  at  the  output  of  the  model,  and  c.;  is  the  actual  re¬ 
sponse  of  the  output  node  to  a  particular  input  pattern  prop¬ 
agated  forward  through  the  model.  We  use  the  cross-entropy 
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Fig.  7.  (a)  Overall  classification  accuracies  for  ten  candidate  classification  models,  showing  relatively  similar  average  performance  across  algorithms  and  model 

architectures. 


cost  function  because  it  is  less  prone  to  local  minima  than  the 
originally  proposed  least  mean-square  (LMS)  error  [40],  owing 
to  the  form  of  the  gradient  used  in  the  stochastic  gradient  de¬ 
scent.  Comparing  this  with  the  more  commonly  used  LMS  error, 
S(.x)  =  ^2i(ci(x)  —  di(x))2 ,  defined  in  [41],  it  can  be  seen 
that  the  cross-entropy  cost  function  eliminates  a  factor  in  the 
gradient  descent  rule  A  (?/;,;  ?)  =  — r/(f)0S /dw.tl  for  the  weight 
vectors  m( .  Specifically,  for  LMS,  the  derivative  of  the  transfer 
function  7J (x)  =  Ci(x)(l  —  Ci(x))  that  appears  in  the  gradient  in 
the  last  layer  weights,  and  in  earlier  layers  through  the  backprop- 
agation  of  error,  can  cause  the  updates  to  become  “frozen”  near 
zero  when  Ci(x)  is  antipodal  to  the  desired  response.  The  latter 
occurs  because  the  derivative  of  the  transfer  function  has  two 
zero  crossings.  The  expression  is  also  zero  when  the  response 
is  near  the  desired  response,  but  it  is  the  antipodal  response  that 
causes  the  undesirable  behavior.  The  form  of  (13)  eliminates  the 
second  zero  crossing  that  causes  this  behavior  because  an  extra 
factor  appears  in  the  gradient  due  to  the  presence  of  the  loga¬ 
rithms. 

One  additional  feature  of  our  supervised  classification  models 
was  the  use  of  an  error-resampling  buffer,  which  increased  the 
frequency  with  which  spectra-causing  misclassifications  were 
presented  to  the  model.  This  forces  filter  adjustments  to  im¬ 
prove  the  model  on  boundaries  between  land  cover  categories 


where  errors  are  more  likely.  This  is  particularly  useful  when 
some  categories  are  sparsely  represented,  as  is  the  case  in  this 
application.  Details  of  this  error-resampling  buffer  are  beyond 
the  scope  of  this  paper,  but  this  approach  tends  to  accelerate 
model  convergence  and  can  lead  to  higher  asymptotic  classifi¬ 
cation  rates  [7]. 

HyMAP  reflectance  data  corresponding  to  the  spectral  end- 
member  sets  delimited  by  the  GPS  and  DGPS  ground  measure¬ 
ments  for  each  category  were  the  input  to  the  model.  These 
data  were  divided  into  three  groups,  one  for  training  and  two 
for  testing  generalization,  as  described  in  greater  detail  in  Sec¬ 
tion  V.  A  few  unreliable  bands  were  eliminated  in  the  vicinity  of 
the  two  major  atmospheric  absorption  windows.  In  some  of  the 
models  described  in  Section  V,  the  data  were  first  projected  into 
a  lower  dimensional  set  of  features  using  either  preoptimized 
filters  derived  with  the  PP  algorithm  described  in  Section  IV-A 
or  PCA.  In  these  models,  the  input  to  the  BPCE  model  con¬ 
sisted  of  the  lower  dimensional  feature  vector  (see  Fig.  3);  in 
other  models,  the  spectral  end-members  were  input  directly  to 
the  BPCE  model. 

When  PCA  was  the  precursor  stage  of  processing,  we  retained 
the  first  42  eigenvectors.  This  number  of  features  may  have  been 
excessive  from  the  standpoint  of  noise  reduction  in  most  cate¬ 
gories,  since  all  but  1.7  x  10-3%  of  the  variance  is  explained 
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Fig.  7.  ( Continued ).  Relative  category  abundances  for  (b)  cross-validation  test  set  and  (c)  sequestered  test  set. 


with  this  many  components;  however,  it  ensured  that  we  would 
not  be  discarding  small-scale  spectral  features  that  might  permit 
discrimination  of  highly  similar  but  distinct  land  cover  types. 
For  the  results  described  in  this  study,  when  PP  was  the  pre¬ 


cursor  stage,  we  used  32  PP  projection  vectors  that  were  first  op¬ 
timized  before  insertion  in  the  end-to-end  classification  model. 

A  variety  of  model  architectures  and  complexities  were  ex¬ 
plored  using  this  framework,  and  the  performance  of  a  set  of 
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Fig.  7.  ( Continued ).  Performance  versus  category  for  the  ten  models:  (d)  cross-validation  test  set. 


ten  examplar  models  is  shown  in  Section  V.  An  analysis  of  the 
variability  in  the  models  suggested  that  smoothing  in  functional 
(classifier)  space  might  achieve  more  reliable  results. 


V.  Results  and  Discussion 

Both  supervised  and  unsupervised  classification  models  of 
the  land  cover  were  produced.  Of  these,  the  first  supervised 
classification  maps  consisted  of  16  of  the  20  land  cover  cate¬ 
gories  in  Table  I,  with  one  aggregate  category  that  combined 
Andropogon  spp.  and  Ammophila  breviligulata  into  a  “back- 
dune”  category.  These  first  models  used  112  ground-referenced 
spectral  end-members  (Fig.  4).  Many  of  the  categories,  such 
as  Myrica  cerifera  Thicket,  Distichlis  spicata,  Spartina  alterni- 
flora,  Backdune  vegetation,  “Wrack”  (Fig.  2)  appear  to  have 
produced  consistent  results  based  on  our  field  observations  and 
historical  data  [32],  [35],  A  few  categories  were  problematic, 
however,  and  these  included  Phragmites  australis ,  Juncus  roe- 
merianus,  and  Uniola  paniculata,  all  of  which  had  high  false- 
alarm  rates.  Our  first  models  did  not  attempt  to  distinguish  the 
Myrica  cerifera  thicket  from  the  pine-hardwood  complex,  par¬ 
ticularly  since  Myrica  cerifera  typically  appears  in  the  under¬ 
story  of  these  tree  stands  (Fig.  2),  and  our  field  surveys  had  not 
at  that  point  sufficiently  documented  the  location  of  represen¬ 
tative  pine  and  hardwood  stands.  Subsequent  models  described 
below  did  include  a  distinction  between  these  two  land  cover 
types  after  additional  ground  data  had  been  acquired. 


After  the  development  of  the  first  automatic  land  cover 
models,  we  visited  the  island  to  obtain  additional  survey  data 
for  validating  the  results.  During  the  visit  between  May  7-11, 
2001,  a  year  from  the  time  of  the  initial  HyMAP  data  acquisi¬ 
tion,  we  collected  ASD  FR  in  situ  spectra  and  a  large  number 
of  additional  survey  points  (examples  of  ASD  spectra  appear 
in  Figs.  6  and  8).  Followup  visits  in  August  and  October  2001 
and  May  2002  (while  this  paper  was  being  revised)  established 
more  accurate  ground  data  using  DGPS  as  described  above. 
Although  the  temporal  gap  between  airborne  and  ground 
data  acquisitions  is  not  ideal,  the  interval  is  short  enough  for 
many  of  the  categories  that  survey  data  would  still  be  reliable. 
Exceptions  to  this  are  mudflats/salt  pannes  and  wrack,  although 
the  dominant  distribution  of  wrack  at  the  mean  high  tide  level 
is  relatively  stable. 

The  second  set  of  supervised  classification  models  that  we 
produced  was  comprised  of  19  of  the  20  categories  (Figs.  4  and 
5)  listed  in  Table  I,  omitting  the  foredune  vegetation,  which  is 
often  sparse  or  nascent  in  the  early  part  of  May  in  our  study 
area.  (Models  described  in  future  papers  using  PROBE2  data 
acquired  in  the  summer,  when  this  vegetation  is  fully  present, 
will  include  this  category.)  In  these  experiments,  we  took  ad¬ 
vantage  of  the  additional  spectral  data  labeled  during  the  DGPS 
surveys.  The  new  experiments  with  the  expanded  set  of  labeled 
spectra  included  3656  training  samples  spread  across  the  19  cat¬ 
egories  previously  described.  Additionally,  two  test  sets  were 
set  aside,  one  for  cross  validation,  which  was  used  to  deter¬ 
mine  a  stopping  point  for  optimization  with  the  training  set,  and 


2324 


IEEE  TRANSACTIONS  ON  GEOSCIENCE  AND  REMOTE  SENSING,  VOL.  40,  NO.  10,  OCTOBER  2002 


Sequestered  Test  Set  Accuracy 


100% 

9S% 

90% 

85% 

80% 

75% 

70% 

65% 

g  60% 

,3  55% 

»  50* 

a 

S  45% 

0) 

a  40% 

o 

35% 

30% 

25% 

20% 

15% 

10% 

5% 

0% 


□  Bps  til 

■  Bpstu4 

□  Bps*i5 

□  Bps*j6 

■  Bps1u7 
BEnsPPI 

■  EnsPP2 

□  EnsH2 

■  BpPCAl 

■  EnsH3 


Land  Cover  Category 


Fig.  7.  (Continued).  Performance  versus  category  for  the  ten  models:  (e)  sequestered  test  set. 


one  sequestered  test  set,  used  as  a  second  estimate  of  expected 
generalization  capability.  The  cross-validation  test  set  contained 
2049  spectral  samples,  while  the  sequestered  test  set  consisted 
of  2836  spectral  samples.  These  models  showed  immediate  im¬ 
provement  because  of  the  improved  georectification  and  larger 
set  of  georeferenced  spectral  end-members  that  were  used  to 
train  these  models.  Particularly  noticeable  was  the  large  reduc¬ 
tion  in  false-alarm  rates  for  Phragmites  australis,  Juncus  roe- 
merianus,  and  Uniola paniculata.  Heavily  inundated  portions  of 
the  northern  salt  marsh  that  were  declared  as  water  in  the  earlier 
models  are  now  either  declared  as  Spartina  alterniflora  or  Mud- 
flat,  and  the  surf  zone,  where  glint  was  present  is  now  correctly 
labeled  as  Water,  rather  than  Beach/sand  (this  is  a  high-tide  re¬ 
sult,  so  much  of  the  beach  is  under  water).  The  amount  of  Mud- 
flat  declared  is  probably  too  large  and  is  the  result  of  a  number 
of  factors,  including  the  early  stage  of  growth,  the  sparseness 
of  the  Spartina  alterniflora  in  areas  of  heavy  inundation,  the 
fact  that  spectra  used  to  model  the  Mudflat  category  may  also 
have  been  partially  inundated,  and  the  occurence  in  mudflats  of 
sparse  vegetation  or  small  deposits  of  wrack.  Nevertheless,  the 
overall  categorization  of  Spartina  alterniflora  in  the  northern 
end  of  the  island  is  significantly  improved. 

Fig.  4  illustrates  the  unsupervised  approach,  depicting  an 
RGB  composite  of  three  PP  projections  and  a  34-category 
PP-ISODATA  category  map  derived  from  this  and  two  other 
PP  projections.  In  contrasting  this  with  the  supervised  classi¬ 
fications  shown  in  Fig.  4,  it  can  be  seen  that  a  number  of  the 
categories  in  the  two  approaches  are  correlated,  although  in 


some  cases  the  PP-ISODATA  map  has  grouped  two  or  more 
categories  that  are  distinct  BPCE  categories,  e.g.,  backdune 
vegetation  and  “wrack”  are  grouped  in  the  PP-ISODATA 
map.  In  some  instances,  the  opposite  is  true.  For  example,  the 
Spartina  alterniflora  category  was  divided  into  two  groups  in 
the  PP-ISODATA  model,  while  it  is,  of  course,  a  single  distri¬ 
bution  in  the  supervised  BPCE  classifications.  The  distinction 
made  by  the  PP-ISODATA  may  be  related  to  differences  asso¬ 
ciated  with  short  versus  tall  forms  of  the  Spartina  alterniflora 
(Fig.  2).  Taller  forms  tend  to  be  located  near  the  berm  edge  of 
creeks  and  channels  in  the  salt  marsh  at  the  northern  end  of 
Smith  Island,  while  shorter  forms  are  found  in  the  interior  of 
the  salt  marsh  where  elevation  is  lower  and  tidal  inundation 
greater.  Fig.  6  compares  the  distribution  of  Myrica  cerifera 
and  tree  stands  predicted  by  the  original  BPCE  model  using 
112  spectral  training  samples  and  the  PP-ISODATA  category 
associated  with  these  vegetation  types.  While  the  distributions 
are  similar,  one  principal  difference  is  that  the  PP-ISODATA 
has  aggregated  an  area  strongly  affected  by  glint,  and  this  leads 
to  distortions  when  compared  with  the  BPCE-predicted  model 
and  ASD  FR  in  situ  measurements.  Several  PP-ISODATA 
categories  are  coherent  in  structure  but  unlabeled  at  this  point, 
pending  further  survey  efforts. 

A  more  exacting  test  of  accuracy  was  made  possible  by  the 
DGPS  data  that  we  collected  on  Smith  Island  in  follow-on  sur¬ 
veys.  Although  we  have  taken  ground  data  on  five  of  the  Virginia 
Coast  Reserve  barrier  islands  during  these  subsequent  visits,  we 
have  spent  roughly  half  of  that  time  on  Smith  Island.  While 
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Fig.  7.  (Continued).  Performance  versus  category  for  the  ten  models:  (f)  typical  confusion  matrix  from  the  first  candidate  model. 


additional  data  will  be  needed  to  validate  PROBE2  data  ac¬ 
quired  in  other  seasons,  these  DGPS  data,  nevertheless,  pro¬ 
vided  us  a  much  higher  degree  of  precision  in  determining  rel¬ 
ative  classification  accuracies  on  Smith  Island.  As  described 
above,  data  labeled  during  the  GPS  and  DGPS  surveys  were 
divided  into  training  data,  cross-validation  data  (used  to  stop 
training  of  the  supervised  models),  and  sequestered  test  data, 
and  19  of  the  categories  listed  in  Table  I  were  used  in  the  models. 
Ten  candidate  models  were  developed  using  the  algorithms  de¬ 
scribed  in  Section  IV.  Of  the  ten  portrayed  in  Fig.  7,  the  first  five 
were  BPCE  models  of  varying  complexity;  models  six  and  seven 
were  composite  PP-BPCE  models,  models  eight  and  ten  were 
BPCE-BPCE  composite  models,  pooling  the  results  of  several 
BPCE  and  BPCE  composite  models,  and  model  nine  was  a 
PCA-BPCE  composite.  While  overall  accuracy  for  the  training 
data  reached  as  high  as  90%  for  the  training  data  in  some  of 
these  classifications,  a  more  important  measure  is  the  extent  to 
which  these  models  generalize  to  sequestered  test  data  when 
challenged.  Overall  accuracy  ranged  between  72%  and  90%  for 
the  training  set,  between  71%  and  80%  for  the  Cross-Valida¬ 
tion  Set,  and  between  58%  and  69%  for  the  sequestered  test  set. 
Fig.  7  compares  the  performance  of  a  set  of  candidate  models 
that  were  produced  for  Smith  Island  using  the  expanded  spectral 
end-member  sets  derived  from  the  DGPS  and  GPS  surveys. 


Relative  abundance  of  categories  in  the  training  and  test  sets 
is  also  reported  in  Fig.  7.  It  shows  that  for  the  cross-validation 
set,  in  13  of  the  19  categories,  one  or  more  models  lie  within 
the  range  between  65%  and  95%  accuracy,  while  14  fall  within 
the  range  between  65%  and  98%  in  the  sequestered  test  set.  Not 
surprisingly,  some  of  the  dominant  categories  such  as  Distichlis 
spicata ,  Myrica  cerifera  Thicket,  Water,  Pine/Hardwood  Com¬ 
plex,  and  Wrack  are  at  the  top  end  of  this  range.  At  the  same 
time,  there  is  a  high  degree  of  variability  in  the  models.  While 
part  of  this  is  due  to  differences  in  algorithms,  a  significant  con¬ 
tribution  is  due  to  the  high  degree  of  spectral  overlap  in  many 
of  the  categories  present  in  the  early  part  of  the  growing  season. 
One  surprising  result  is  the  performance  for  the  invasive  plant 
Phragmites  australis.  In  both  test  sets,  for  the  category  Phrag¬ 
mites  australis,  at  least  one  model  exceeds  the  65%  threshold, 
obtaining  73%  and  68%  accuracy  respectively  on  the  cross-val¬ 
idation  and  sequestered  test  sets.  In  the  southern  end  of  Smith 
Island,  this  invasive  species  typically  grows  in  the  ecotone  be¬ 
tween  thicket  and  the  marsh  vegetation  (Fig.  8)  and  is,  therefore, 
difficult  to  detect  due  to  mixing  with  other  categories,  such  as 
Myrica  cerifera  Thicket.  The  left-hand  column  of  spectral  plots 
in  Fig.  8  portrays  this  mixing,  comparing  the  spectral  response 
of  HyMAP  at  areas  known  from  our  ground  survey  to  consist 
of  exposed  Phragmites,  Myrica  cerifera,  and  Phragmites  aus- 
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Fig.  8.  (Top  row  photographs)  Exposed  Phragmites  australis ,  Myrica  cerifera  thicket,  and  Phragmites  near  the  thicket.  Spectral  plots,  left  column:  (top)  exposed 
Phragmites  australis,  (middle)  Phragmites  australis  near  Myrica  cerifera,  (bottom)  Myrica  cerifera.  Spectral  plots,  top  row:  (left)  Mean  and  standard  deviation  of 
PP-ISODATA  category  associated  with  Phragmites  australis  near  thicket,  and  (middle)  BPCE  classification  for  all  Phragmites,  exposed  and  near  thicket;  (right) 
ASD  FR  spectrum  of  Phragmites  australis.  Spectral  plots,  bottom  row:  (left)  PP-ISODATA  category  associated  with  Myrica  cerifera  and  tree  stands,  distorted  by 
glint  grouped  with  the  category,  and  (middle)  BPCE  clasification  of  Myrica  cerifera ;  (right)  ASD  FR  spectrum  of  lower  canopy  Myrica  cerifera  leaves. 
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Fig.  9.  Comparison  of  model  performance  for  Phragmites  australis:  unsupervised  versus  supervised  models  (in  red).  (Top  row)  PP-ISODATA;  (middle  row)  BPCE 
model  based  on  original  112  spectral  samples;  (bottom  row)  BPCE  model  based  on  expanded  spectral  set  of  3656  training  samples  and  improved  georectification. 
Also  shown:  areas  identified  as  Phragmites  australis  during  DGPS  surveys.  Zoomed  areas  show  predictions  in  vicinity  of  Phragmites  near  thicket.  PP-ISODATA 
and  first  BPCE  model  based  on  1 12  spectra  were  prior  to  improved  georectification,  so  the  four  Phragmites  patches  shown  (areas  in  yellow  from  DGPS)  appear 
shifted  toward  the  bottom  of  the  figure  relative  to  the  predicted  distributions  in  the  top  and  middle  rows. 


tralis  adjacent  to  the  Myrica  cerifera  thicket.  Phragmites  aus¬ 
tralis  is  not  one  of  the  dominant  vegetation  types  on  this  island, 
which  also  makes  it  a  challenging  category  to  model.  Inspec¬ 
tion  of  the  predicted  distributions  of  Phragmites  australis  in 
Figs.  5  and  9  shows  that  the  models  based  on  the  expanded  set 
of  spectral  end-members  has  achieved  a  substantial  reduction 
in  false-alarm  rate,  when  compared  with  the  first  set  of  models 
that  used  1 12  spectral  end-members.  Phragmites  is  detected  by 
this  model  both  along  the  thicket  edge  and  in  areas  where  it  is 
more  exposed.  The  PP-ISODATA  category  most  closely  associ¬ 
ated  with  Phragmites  australis  only  detected  Phragmites  near 
the  thicket;  however,  looking  at  Fig.  9,  it  can  be  seen  that  it 
lacked  the  desired  specificity,  tending  to  group  other  shrubs  on 
the  edge  with  the  Phragmites.  The  best  result  was  the  BPCE 


model  using  the  expanded  set  of  spectral  inputs,  also  shown  in 
Fig.  9.  While  the  false-alarm  rate  could  still  be  further  improved, 
it  shows  the  most  consistent  prediction  of  Phragmites  both  in  the 
open  and  along  the  thicket  boundary.  We  conjecture  that  data  ac¬ 
quired  on  dates  later  in  the  growth  cycle  may  eventually  allow 
us  to  reduce  the  false-alarm  rate  further.  One  additional  observa¬ 
tion  concerning  the  supervised  classification  is  that  the  Phrag¬ 
mites  results  for  the  PP-BPCE  model  and  PCA-BPCE  model 
had  higher  false-alarm  rates  than  the  BPCE  model  in  isolation, 
but  the  PP-BPCE  model  did  have  a  markedly  better  false-alarm 
rate  than  PCA-BPCE  (Fig.  5),  which  is  not  surprising  given  ear¬ 
lier  arguments  developed  in  Section  IV. 

Looking  at  the  results  in  Fig.  5,  we  note  that  similar  problems 
that  had  existed  for  the  category  Juncus  roemerianus  in  terms 
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of  false-alarm  rate  in  the  original  BPCE  classification  are  now 
largely  corrected  in  the  models  based  on  the  expanded  spectral 
set  with  improved  georeferencing. 

Categories  such  as  Beach/sand,  that  would  ordinarily  not  be  so 
difficult  to  identify,  were  more  problematic  because  the  HyMAP 
scene  had  a  significant  area  of  glint  in  the  beach  and  surf  zone  on 
the  eastern  shore  of  Smith  Island,  due  to  the  high  sun  angle  at  the 
time  of  the  data  collection,  and  this  contributed  to  the  high  degree 
of  variability  in  performance  acros  s  the  ten  candidate  models .  The 
presence  of  the  glint  also  effected  performance  for  categories  such 
as  Peat  Outcrop,  although  this  category  can  be  challenging  in  and 
of  itself  because  of  its  presence  in  the  surf  zone,  depending  on 
whether  data  are  acquired  near  high  or  low  tide. 

Although  in  many  areas  the  dune  vegetation  shows  the 
proper  delineation  of  Andropogon  spp.  toward  the  upland  and 
Ammophila  brew  toward  the  beach,  the  BPCE  models  based  on 
the  expanded  spectral  sets  all  tended  to  confuse  Ammophila  brew. 
with  the  Andropogon  spp.  in  many  of  the  specific  ROIs  used  to 
evaluate  accuracy.  At  this  time  of  year,  grasses  and  sedges  such 
as  these  and  Spartina patens ,  also  found  in  the  dune  environment, 
are  all  tonally  similar.  For  example.  Fig.  7(e)  shows  that  Spartina 
patens  is  most  often  confused  with  either  Distichlis  spicata,  the 
dominant  swale  grass,  or  Andropogon  spp.  Data  acquired  in  the 
early  fall,  when  Andropogon  is  quite  distinct  visually  from  the 
other  two,  is  likely  to  improve  results.  Thus,  we  expect  that  the 
October  PROBE2  data  acquisition  will  achieve  higher  accuracy 
when  classification  models  are  developed. 

Other  sources  of  difficulty  for  these  models  stem  from  the 
time  of  the  year  that  the  HyMAP  data  were  acquired.  At  the  be¬ 
ginning  of  May,  it  is  early  in  the  growing  season  in  the  VCR,  so 
many  vegetation  communities  contain  a  mixture  of  new  growth 
and  senescent  or  dead  vegetation  from  the  previous  growth 
cycle.  Distinguishing  vegetation  types  such  as,  for  example 
Distichlis  spicata  from  Scirpus  spp.,  may  be  very  difficult 
to  achieve  spectrally  at  this  time  of  year,  and  this  probably 
accounts  for  the  fact  that  the  majority  of  errors  for  the  category 
Scirpus  spp.  are  the  result  of  confusion  with  Distichlis  spicata. 
As  we  have  noted  earlier,  tidal  influences  provide  additional 
sources  of  spectral  variability  for  many  of  the  marsh  vegetation 
communities  because  of  variations  in  degree  of  inundation,  and 
it  obviously  effects  the  beach  zone,  depending  on  the  degree  of 
inundation  or  wetting  present.  Many  points  that  we  acquired 
in  the  beach  zone  in  the  first  surveys  were  effectively  under 
water  due  to  tidal  stage  or  in  an  area  of  strong  glint  due  to  the 
time  of  data  acquisition  and,  therefore,  could  not  be  used  in  the 
analysis.  Although  more  than  one  model  obtains  a  respectable 
score  for  the  category  Beach/sand,  the  poor  performance  for 
this  category  in  the  other  models  is  almost  certainly  due  to  the 
presence  of  glint. 

VI.  Conclusion 

Our  goal  was  to  develop  land  cover  maps  that  would  be  useful 
to  natural  resource  managers  at  higher  spatial  resolution  than 
has  been  available  previously.  Both  unsupervised  and  super¬ 
vised  classification  approaches  were  used  to  create  these  prod¬ 
ucts  and  to  evaluate  their  relative  merits.  We  have  seen  that  au¬ 
tomatic  land  cover  classification  models  can  be  developed  suc¬ 


cessfully  from  HyMAP  imagery,  even  in  the  early  part  of  the 
growing  season  when  spectral  differences  in  vegetation  may  not 
be  as  pronounced.  The  expectation  is  that  a  more  ideal  data  ac¬ 
quisition  date  in  late  summer  or  early  fall  would  improve  results 
further.  PROBE2  imagery  acquired  during  those  intervals  will 
be  used  to  evaluate  this  conjecture.  The  use  of  a  hyperspectral 
sensor  with  spatial  resolution  of  4.5  m  was  deemed  necessary  in 
order  to  be  able  to  discriminate  rapidly  varying  land  cover  types 
seen,  e.g.,  in  the  transition  zone  from  the  lagoonal  shore  to  the 
upland.  On  Smith  Island,  six  to  seven  distinct  vegetation  zones 
may  occur  in  a  distance  as  short  as  50-75  m. 

Some  technical  difficulties  such  as  extensive  glint  present  in 
the  HyMAP  data  in  the  beach  and  surf  zone  limited  what  could 
be  achieved  given  a  more  ideal  time  of  day  for  data  collection. 
Other  challenges  stemmed  from  the  fact  that  the  data  was  ac¬ 
quired  near  high  tide.  Despite  these  difficulties  and  the  fact  that 
the  early  part  of  the  growing  season  may  not  be  the  ideal  time  for 
distinguishing  many  types  of  vegetation,  we  have  demonstrated 
success  in  identifying  both  plant  communities  and,  in  some  in¬ 
stances,  individual  plant  species  from  HyMAP  through  our  field 
validation  efforts  with  GPS,  DGPS,  and  in  situ  reflectance  mea¬ 
surements.  Supervised  classification  models  based  on  spectra 
labeled  during  GPS  and  DGPS  surveys  were  used  to  demon¬ 
strate  that  models  could  discriminate  19  land  cover  types.  Some 
of  these  categories  were  defined  at  the  plant  community  level, 
with  others  being  specific  plant  species. 

Although  there  were  differences  between  in  situ  measure¬ 
ments  and  the  airborne  hyperspectral  data,  there  were  strong 
correlations  between  spectral  shape.  Unsupervised  models 
based  on  a  PP-ISODATA  hybrid  were  found  to  agree  with 
the  supervised  models  for  a  number  of  categories.  In  some 
cases,  the  exploratory  PP-ISODATA  approach  may  have 
identified  subgroups  within  a  major  category  such  as  Spartina 
alterniflora ,  for  which  it  was  observed  that  the  unsupervised 
approach  may  be  dividing  the  data  into  low  and  high  vigor 
forms  of  the  same  species.  Without  a  priori  knowledge  of  pixel 
labels,  the  PP-ISODATA  approach  was  found  to  be  correlated 
with  Phragmites  australis  that  grows  in  the  margin  between 
marsh  and  upland;  however,  this  approach  did  not  identify 
exposed  Phragmites.  The  partial  success  of  this  exploratory 
approach  also  benefited  from  the  ability  to  input  both  spectral 
and  spatial-spectral  windows.  Accuracy  and  specificity  of 
supervised  models  based  on  BPCE  and  composite  models, 
especially  for  Phragmites  australis ,  were  found  to  be  highly 
dependent  on  the  size  of  the  labeled  spectral  training  samples 
and  on  the  accuracy  of  the  georeferencing.  Increasing  this 
accuracy  and  expanding  the  number  of  spectral  samples  used 
in  training  provided  a  significant  reduction  in  false-alarm  rate 
for  multiple  categories,  including  Phragmites.  The  best  model 
overall  for  Phragmites  used  BPCE  and  the  expanded  set  of 
spectral  training  samples. 
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